[[File:Lac Operon.svg|thumb|250px|
Top: The transcription of the gene is turned off. There is no lactose to inhibit the repressor, so the repressor binds to the operator, which obstructs the RNA polymerase from binding to the promoter and making the mRNA encoding the lactase gene.
Bottom: The gene is turned on. Lactose is inhibiting the repressor, allowing the RNA polymerase to bind with the promoter and express the genes, which synthesize lactase. Eventually, the lactase will digest all of the lactose, until there is none to bind to the repressor. The repressor will then bind to the operator, stopping the manufacture of lactase.]]
In genetics, a promoter is a sequence of DNA to which bind to initiate transcription of a single RNA transcript from the DNA downstream of the promoter. The RNA transcript may encode a protein (mRNA), or can have a function in and of itself, such as tRNA or rRNA. Promoters are located near the transcription start sites of genes, upstream on the DNA (towards the 5' region of the sense strand). Promoters can be about 100–1000 base pairs long, the sequence of which is highly dependent on the gene and product of transcription, type or class of RNA polymerase recruited to the site, and species of organism.
Promoters represent critical elements that can work in concert with other regulatory regions (enhancers, silencers, boundary elements/insulators) to direct the level of transcription of a given gene. A promoter is induced in response to changes in abundance or conformation of regulatory proteins in a cell, which enable activating transcription factors to recruit RNA polymerase.
Given the short sequences of most promoter elements, promoters can rapidly evolve from random sequences. For instance, in Escherichia coli, ~60% of random sequences can evolve expression levels comparable to the wild-type Lac operon with only one mutation, and that ~10% of random sequences can serve as active promoters even without evolution.
The above promoter sequences are recognized only by RNA polymerase holoenzyme containing sigma-70. RNA polymerase holoenzymes containing other sigma factors recognize different core promoter sequences.
← upstream downstream → 5'-XXXXXXXPPPPPPXXXXXXPPPPPPXXXXGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGXXXX-3' -35 -10 Gene to be transcribed
for -10 sequence T A T A A T 77% 76% 60% 61% 56% 82%
for -35 sequence T T G A C A 69% 79% 61% 56% 54% 54%
More recently, one study measured most genes controlled by tandem promoters in E. coli. In that study, two main forms of interference were measured. One is when an RNAP is on the downstream promoter, blocking the movement of RNAPs elongating from the upstream promoter. The other is when the two promoters are so close that when an RNAP sits on one of the promoters, it blocks any other RNAP from reaching the other promoter. These events are possible because the RNAP occupies several nucleotides when bound to the DNA, including in transcription start sites. Similar events occur when the promoters are in divergent and convergent formations. The possible events also depend on the distance between them.
Eukaryotic promoter regulatory sequences typically bind proteins called transcription factors that are involved in the formation of the transcriptional complex. An example is the E-box (sequence CACGTG), which binds transcription factors in the basic helix-loop-helix (bHLH) family (e.g. BMAL1-Clock, cMyc). Some promoters that are targeted by multiple transcription factors might achieve a hyperactive state, leading to increased transcriptional activity.
Cis-regulatory modules that are localized in DNA regions distant from the promoters of genes can have very large effects on gene expression, with some genes undergoing up to 100-fold increased expression due to such a cis-regulatory module. These cis-regulatory modules include enhancers, silencers, insulators and tethering elements. Among this constellation of elements, enhancers and their associated transcription factors have a leading role in the regulation of gene expression.
Enhancers are regions of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene expression programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes. In a study of brain cortical neurons, 24,937 loops were found, bringing enhancers to promoters. Multiple enhancers, each often at tens or hundred of thousands of nucleotides distant from their target genes, loop to their target gene promoters and coordinate with each other to control expression of their common target gene.
The schematic illustration in this section shows an enhancer looping around to come into close physical proximity with the promoter of a target gene. The loop is stabilized by a dimer of a connector protein (e.g. dimer of CTCF or YY1), with one member of the dimer anchored to its binding motif on the enhancer and the other member anchored to its binding motif on the promoter (represented by the red zigzags in the illustration). Several cell function specific transcription factors (there are about 1,600 transcription factors in a human cell) generally bind to specific motifs on an enhancer and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern the level of transcription of the target gene. Mediator (coactivator) (a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (pol II) enzyme bound to the promoter.
Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two eRNAs as illustrated in the Figure. An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the transcription factor may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of transcription factor bound to enhancer in the illustration). An activated enhancer begins transcription of its RNA before activating a promoter to initiate transcription of messenger RNA from its target gene.
Bidirectionally paired genes in the Gene Ontology database shared at least one database-assigned functional category with their partners 47% of the time. Microarray analysis has shown bidirectionally paired genes to be co-expressed to a higher degree than random genes or neighboring unidirectional genes. Although co-expression does not necessarily indicate co-regulation, methylation of bidirectional promoter regions has been shown to downregulate both genes, and demethylation to upregulate both genes. There are exceptions to this, however. In some cases (about 11%), only one gene of a bidirectional pair is expressed. In these cases, the promoter is implicated in suppression of the non-expressed gene. The mechanism behind this could be competition for the same polymerases, or chromatin modification. Divergent transcription could shift nucleosomes to upregulate transcription of one gene, or remove bound transcription factors to downregulate transcription of one gene.
Some functional classes of genes are more likely to be bidirectionally paired than others. Genes implicated in DNA repair are five times more likely to be regulated by bidirectional promoters than by unidirectional promoters. Chaperone proteins are three times more likely, and mitochondrial genes are more than twice as likely. Many basic housekeeping and cellular metabolic genes are regulated by bidirectional promoters. The overrepresentation of bidirectionally paired DNA repair genes associates these promoters with cancer. Forty-five percent of human somatic oncogenes seem to be regulated by bidirectional promoters – significantly more than non-cancer causing genes. Hypermethylation of the promoters between gene pairs WNT9A/CD558500, CTDSPL/BC040563, and KCNK15/BF195580 has been associated with tumors.
Certain sequence characteristics have been observed in bidirectional promoters, including a lack of , an abundance of CpG site, and a symmetry around the midpoint of dominant Cs and As on one side and Gs and Ts on the other. A motif with the consensus sequence of TCTCGCGAGA, also called the CGCG element, was recently shown to drive PolII-driven bidirectional transcription in CpG islands. are common, as they are in many promoters that lack TATA boxes. In addition, the Sequence motif NRF-1, GABPA, YY1, and ACTACAnnTCCC are represented in bidirectional promoters at significantly higher rates than in unidirectional promoters. The absence of TATA boxes in bidirectional promoters suggests that TATA boxes play a role in determining the directionality of promoters, but counterexamples of bidirectional promoters do possess TATA boxes and unidirectional promoters without them indicates that they cannot be the only factor.
Although the term "bidirectional promoter" refers specifically to promoter regions of mRNA-encoding genes, luciferase assays have shown that over half of human genes do not have a strong directional bias. Research suggests that non-coding RNAs are frequently associated with the promoter regions of mRNA-encoding genes. It has been hypothesized that the recruitment and initiation of RNA polymerase II usually begins bidirectionally, but divergent transcription is halted at a checkpoint later during elongation. Possible mechanisms behind this regulation include sequences in the promoter region, chromatin modification, and the spatial orientation of the DNA.
Strict conservation of these motifs are not necessary, and many archaea with high GC% show "degenerated" TATA boxes. Rather, it's the energetic (duplex enthalpy, duplex stability) and structual (intrinsic curvature, bendability) features of the promoter that mainly matter.
One approach is to use a biophysical theory of why promoters work. For archaea, a combination of calculated energetic and structual features can detect promoters. For bacteria, a biophysical model that estimates RNAP-sigma70 binding probability can detect and estimate the strengths of promoters.
Another approach is to use a pattern-matching program based on known promoters, from simple hand-crafted regular expressions to advanced machine learning methods such as decision trees, hidden Markov models (HMM), and neural networks. YAPP, an 2000s eukaryotic core-promoter prediction program, uses HMM. A 2017 publication predicts bacterial and eukaryotic promoters using a convolutional neural network.
The promoter binding process is crucial in the understanding of the process of gene expression. Tuning synthetic genetic systems relies on precisely engineered synthetic promoters with known levels of transcription rates.
Not listed here are the many kinds of cancers involving aberrant transcriptional regulation owing to creation of through pathological chromosomal translocation. Importantly, intervention in the number or structure of promoter-bound proteins is one key to treating a disease without affecting expression of unrelated genes sharing elements with the target gene. Some genes whose change is not desirable are capable of influencing the potential of a cell to become cancerous.
Distal promoters also frequently contain CpG islands, such as the promoter of the DNA repair gene ERCC1, where the CpG island-containing promoter is located about 5,400 nucleotides upstream of the coding region of the ERCC1 gene. CpG islands also occur frequently in promoters for functional noncoding RNAs such as .
Altered expressions of also silence or activate many genes in progression to cancer (see microRNAs in cancer). Altered microRNA expression occurs through hyper/hypo-methylation of in CpG islands in promoters controlling transcription of the .
Silencing of DNA repair genes through methylation of CpG islands in their promoters appears to be especially important in progression to cancer (see methylation of DNA repair genes in cancer).
In the case of a transcription factor binding site, there may be a single sequence that binds the protein most strongly under specified cellular conditions. This might be called canonical.
However, natural selection may favor less energetic binding as a way of regulating transcriptional output. In this case, we may call the most common sequence in a population the wild-type sequence. It may not even be the most advantageous sequence to have under prevailing conditions.
Recent evidence also indicates that several genes (including the proto-oncogene c-myc) have G-quadruplex motifs as potential regulatory signals.
Examples include:
Binding
Location
Diseases associated with aberrant function
CpG islands in promoters
Methylation of CpG islands stably silences genes
Promoter CpG hyper/hypo-methylation in cancer
Canonical sequences and wild-type
Synthetic promoter design and engineering
Diseases that may be associated with variations
Constitutive vs regulated
Tissue-specific promoter
Use of the term
See also
External links
|
|